Introduction and Definition of Nonlinearity

The purpose of this script is to compute a measure of linearity during reading of the short story.

Linear navigation was defined as movement forwards in the text without skipping any pages. Nonlinear navigation, on the other hand, was defined as a regression or a forward leap according to the following:

Browsing was used in the definition of nonlinearity because the intention behind browsing is to move between locations in the text rather than read. Browsing speed was defined as less than one second used on a page-view. One second would not be enough time to read the text on a page, but it would only allow participants to get a gist of the information.

To measure frequency of nonlinear navigation, we calculate how often a nonnavigation event is initiated. A nonlinear is initiated if a nonlinear event occurs following a linear navigation event, or a nonlinear event to a different direction (e.g. forward leap following a regression) or executed via a different method (e.g. a regression by using the progress bar following a regression by turning pages backwards). Simply put, previous event’s (k-1) linearity, method of nonlinear navigation or direction of nonlinear navigation, does not correspond with the current event’s (k) linearity, method or direction of nonlinear navigation, and the current event is nonlinear. This measure therefore reflects the frequency of initiating nonlinear navigation during reading of the story.

Frequency of initiating nonlinearity was used as a measure of linearity of reading instead of linearity categories (linear or nonnavigation, regression, or forward leap) to make sure that each navigation event is only counted once. Using the linearity category would inflate the amount of nonlinear navigation if it is used across multiple pages (e.g. a regression that includes the participant going multiple pages backwards by page turns).

This script uses a dataframe that was wrangled in Prep_TrackingDataWrangling.Rmd.

The working directory is not changed with setwd() because this script is knit remotely in other scripts.

if (exists("ExternalAnalysisFilePath")) {
    # ExternalAnalysisFilePath: ~/Short_Story_Reading_Behaviour_Public/
    mypath_SSRB <- ExternalAnalysisFilePath
} else if (grepl("Prep", getwd())) {
    mypath_SSRB <- dirname(getwd())
} else if (grepl("Short_Story_Reading_Behaviour_Public", getwd())) {
    mypath_SSRB <- getwd()
} else {
    # get working directory manually
    mypath_SSRB <- paste0(
            dirname(getwd()),
            "/Short_Story_Reading_Behaviour_Public"
        )
}

Setup

library(tidyverse)
library(tidyr)
library(dplyr)
library(plotly) # interactive plots
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
getwd() # working directory should be ~/Short_Story_Reading_Behaviour_Public
## [1] "C:/Users/Pauliina/Documents/GITHUB/Short_Story_Reading_Behaviour_Public/Prep"

Load data and check variable types

To determine linearity, we use grouped_tracking_data that was created in Prep_TrackingDataWrangling.Rmd

grouped_tracking_data <-
    read.csv(
        paste0(
            mypath_SSRB,
            "/Data/wrangled_grouped_tracking_data.csv"
        ),
        header = TRUE,
        sep = ";",
        dec = ","
    )
grouped_tracking_data <- dplyr::select(grouped_tracking_data, -X, -X.1)
str(grouped_tracking_data)
## 'data.frame':    3175 obs. of  96 variables:
##  $ StoryId                        : int  7 7 7 7 7 7 7 7 7 7 ...
##  $ UserId                         : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ ReadingSessionNumber           : int  4 4 4 4 4 4 4 4 4 4 ...
##  $ EngagementTypeId               : int  1 2 2 2 2 2 3 4 4 4 ...
##  $ Id                             : int  7784 7785 7788 7789 7819 7821 7850 7851 7852 7853 ...
##  $ NavigationBlockNumber          : int  27 28 29 30 31 32 33 34 35 36 ...
##  $ BaselineSpeed                  : num  320 320 320 320 320 ...
##  $ IsBaselineSpeedAdjusted        : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ AdjustedBaselineSpeed          : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ IsIntrinsicCondition           : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ Date                           : chr  "2020-05-16" "2020-05-16" "2020-05-16" "2020-05-16" ...
##  $ Time                           : chr  "2022-09-29 11:03:13.290" "2022-09-29 11:03:14.109" "2022-09-29 11:03:21.053" "2022-09-29 11:03:21.409" ...
##  $ TimeBeforeDeadlinesDays        : num  3.99 3.99 3.99 3.99 3.98 ...
##  $ TimeBeforeDeadlineMinutes      : num  5741 5741 5741 5741 5728 ...
##  $ Type                           : chr  "openBook" "openPage" "keyboardBackward" "openPage" ...
##  $ StartLocation                  : int  0 0 861 0 16134 17126 31630 31630 31630 26583 ...
##  $ VisibleCharacterCount          : int  0 861 1032 861 992 1083 733 733 733 1074 ...
##  $ VisibleWordCount               : int  0 179 240 179 243 239 172 172 172 244 ...
##  $ PageInSection                  : int  0 1 2 1 16 17 31 31 31 26 ...
##  $ TotalPagesInSection            : int  0 31 31 31 31 31 31 31 31 31 ...
##  $ Device                         : chr  "Other" "Other" "Other" "Other" ...
##  $ OperatingSystem                : chr  "Windows" "Windows" "Windows" "Windows" ...
##  $ Browser                        : chr  "Chrome" "Chrome" "Chrome" "Chrome" ...
##  $ ReadingBlockNumber             : int  8 9 10 11 26 27 41 41 41 42 ...
##  $ IsBlurred                      : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ IsDialogOpen                   : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ IsMenuOpen                     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ IsInactive                     : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ IsReading                      : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ TimezoneOffset                 : int  -60 -60 -60 -60 -60 -60 -60 -60 -60 -60 ...
##  $ WindowHeight                   : int  578 578 578 578 578 578 578 578 578 578 ...
##  $ WindowWidth                    : int  1280 1280 1280 1280 1280 1280 1280 1280 1280 1280 ...
##  $ IsSelectionOpen                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ IsOpeningEvent                 : logi  TRUE FALSE FALSE FALSE FALSE FALSE ...
##  $ IsClosingEvent                 : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ IsPageOpen                     : logi  FALSE TRUE TRUE TRUE TRUE TRUE ...
##  $ IsDurationFixed                : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ ReadingBlockDuration           : num  0.0137 0.1046 0.0171 0.6495 4.4955 ...
##  $ EngagedReadingDuration         : num  0 0.1046 0.0171 0.6495 4.4955 ...
##  $ EngagedReadingSpeed            : num  NA 1711.8 14035.1 275.6 54.1 ...
##  $ EngagedSpeedLabel              : chr  "" "Scanning" "Scanning" "DeepReading" ...
##  $ Direction                      : chr  "" "" "Backward" "" ...
##  $ NavigationBlockDirection       : chr  "" "Forward" "Backward" "Forward" ...
##  $ Condition                      : chr  "NonAutonomousCondition" "NonAutonomousCondition" "NonAutonomousCondition" "NonAutonomousCondition" ...
##  $ IsLastEventInReadingSession    : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ DurationMinutes                : num  0.01365 0.09843 0.00593 0.64272 4.4897 ...
##  $ IsAdjustedDuration             : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ NavigationBlockDuration        : num  0.01365 0.11573 0.00593 12.67953 4.49552 ...
##  $ ContinuousEngagementMinutes    : num  0.0137 28.2973 28.2973 28.2973 28.2973 ...
##  $ ContinuousEngagementSeconds    : num  0.819 1697.837 1697.837 1697.837 1697.837 ...
##  $ Engagement                     : chr  "Disengagement" "Engagement" "Engagement" "Engagement" ...
##  $ PotentialReadingSessionArtefact: logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ EndLocation                    : int  0 861 1893 861 17126 18209 32363 32363 32363 27657 ...
##  $ VisibleColumns                 : chr  "TwoColumns" "TwoColumns" "TwoColumns" "TwoColumns" ...
##  $ CumulativeRSTime               : num  0.0137 0.1121 0.1353 0.778 17.3045 ...
##  $ IsEngagement                   : logi  FALSE TRUE TRUE TRUE TRUE TRUE ...
##  $ CumulativeEngagementTimeInRS   : num  0 0.0984 0.1217 0.7644 17.2909 ...
##  $ CumulativeEngagementTime       : num  0 0.0984 0.1217 0.7644 17.2909 ...
##  $ Author                         : chr  "MaryEWilkinsFreeman" "MaryEWilkinsFreeman" "MaryEWilkinsFreeman" "MaryEWilkinsFreeman" ...
##  $ Title                          : chr  "The Yates Pride" "The Yates Pride" "The Yates Pride" "The Yates Pride" ...
##  $ AverageWordFrequency           : num  4.07 4.07 4.07 4.07 4.07 ...
##  $ SDWordFrequency                : num  14.2 14.2 14.2 14.2 14.2 ...
##  $ NumberOfWords                  : int  1458 1458 1458 1458 1458 1458 1458 1458 1458 1458 ...
##  $ NumberOfSentences              : int  461 461 461 461 461 461 461 461 461 461 ...
##  $ AverageSentenceLength          : num  15.4 15.4 15.4 15.4 15.4 ...
##  $ SDSentenceLength               : num  11.4 11.4 11.4 11.4 11.4 ...
##  $ CharacterLength                : int  32363 32363 32363 32363 32363 32363 32363 32363 32363 32363 ...
##  $ WordLength                     : int  7253 7253 7253 7253 7253 7253 7253 7253 7253 7253 ...
##  $ AverageRating                  : num  2.65 2.65 2.65 2.65 2.65 2.65 2.65 2.65 2.65 2.65 ...
##  $ SDRating                       : num  1.13 1.13 1.13 1.13 1.13 ...
##  $ MedianRating                   : int  2 2 2 2 2 2 2 2 2 2 ...
##  $ MinRating                      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ MaxRating                      : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ PublicationYear                : int  1912 1912 1912 1912 1912 1912 1912 1912 1912 1912 ...
##  $ PublicationType                : chr  "PublicDomain" "PublicDomain" "PublicDomain" "PublicDomain" ...
##  $ Genre                          : chr  "Romance" "Romance" "Romance" "Romance" ...
##  $ Percentage                     : num  0 0 0.0266 0 0.4985 ...
##  $ NBFirstStartLocation           : int  0 0 861 0 16134 17126 31630 31630 31630 26583 ...
##  $ NBLastStartLocation            : int  0 861 861 15327 16134 31630 31630 31630 31630 26583 ...
##  $ NBFirstVisibleCharacterCount   : int  0 861 1032 861 992 1083 733 733 733 1074 ...
##  $ NBLastVisibleCharacterCount    : int  0 1032 1032 807 992 733 733 733 733 1074 ...
##  $ NBFirstReadingBlock            : int  8 9 10 11 26 27 41 41 41 42 ...
##  $ NBLastReadingBlock             : int  8 10 10 25 26 41 41 41 41 42 ...
##  $ NBFirstPage                    : int  0 1 2 1 16 17 31 31 31 26 ...
##  $ NBLastPage                     : int  0 2 2 15 16 31 31 31 31 26 ...
##  $ NBFirstPagesInSection          : int  0 31 31 31 31 31 31 31 31 31 ...
##  $ NBLastPagesInSection           : int  0 31 31 31 31 31 31 31 31 31 ...
##  $ NBEndLocation                  : int  0 1893 1893 16134 17126 32363 32363 32363 32363 27657 ...
##  $ NBAverageReadingSpeed          : num  NA 5819.6 14035.1 285.1 54.1 ...
##  $ NBFirstCumulativeRSTime        : num  0.0137 0.1121 0.1353 0.778 17.3045 ...
##  $ NBLastCumulativeRSTime         : num  0.0137 0.1294 0.1353 12.8148 17.3104 ...
##  $ NBDuration                     : num  0.01365 0.11573 0.00593 12.67953 4.49552 ...
##  $ NBSpeedLabel                   : chr  "" "Scanning" "Scanning" "DeepReading" ...
##  $ NBAnyProgressBarUsage          : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ NBFirstTimeBeforeDeadline      : num  3.99 3.99 3.99 3.99 3.98 ...
##  $ NBLastTimeBeforeDeadline       : num  3.99 3.99 3.99 3.98 3.97 ...
source(
    paste0(
        mypath_SSRB,
        "/Functions/Functions_VariableTypeConversion.R"
    )
)

## turn user and story indicators into factors
grouped_tracking_data[, c(
    "UserId",
    "StoryId"
)] <- convert.magic(
    grouped_tracking_data[, c(
        "UserId",
        "StoryId"
    )],
    "factor"
)
## turn reading block (page view) number,
### reading session number, and
#### navigation block number into ordered factors
grouped_tracking_data[, c(
    "ReadingBlockNumber",
    "ReadingSessionNumber",
    "NavigationBlockNumber"
)] <- convert.magic(
    grouped_tracking_data[, c(
        "ReadingBlockNumber",
        "ReadingSessionNumber",
        "NavigationBlockNumber"
    )],
    "ordered"
)
## fix date and time variable types
grouped_tracking_data$Date <-
    as.Date(
        grouped_tracking_data$Date,
        format = "%Y-%m-%d"
    )
TempTimeObject <- strsplit(grouped_tracking_data$Time, " ")
TempTimeObject <- sapply(TempTimeObject, "[[", 2)
grouped_tracking_data$TempTimeObject <- TempTimeObject
grouped_tracking_data <- dplyr::select(grouped_tracking_data, -Time)
names(grouped_tracking_data)[which(colnames(grouped_tracking_data) == "TempTimeObject")] <- "Time"
grouped_tracking_data$DateTime <- paste(
    grouped_tracking_data$Date,
    grouped_tracking_data$Time
)
op <-
    options(digits.secs = 3)
grouped_tracking_data$Time <-
    strptime(
        grouped_tracking_data$Time,
        format = "%H:%M:%OS"
    )
grouped_tracking_data <- grouped_tracking_data %>%
    mutate(DateTime = as.POSIXct(DateTime, format = "%Y-%m-%d %H:%M:%OS"))
# order the df by User, date, and time
grouped_tracking_data <-
    grouped_tracking_data[
        with(
            grouped_tracking_data,
            order(UserId, Date, Time)
        ),
    ]

Determine linearity

Next we determine how much the participants’ location shifts in between different navigation blocks. This essentially tells us how much, and to what direction, the participant moved in the text (if at all).

To count the number of pages turned, we first create variables that check whether consecutive navigation blocks (one per row) should be compared. We only want to compare navigation blocks from a) the same user b) the same reading session and c) the same reading set up (see below).

A reading set up affects the amount of ‘pages’ that the short story has, for example, in a small browser window the story is organised in more ‘pages’ than in a large browser window. Pages turned should only be determined between events that have the same reading set up. Note that this does not limit our ability to notice linearity of navigation as changing the reading set up will either result in an event that is in its own navigation block (which can then be compared to a subsequent navigation block; this happens e.g. when the participant resizes a browser window), or a change in reading set up starts a new reading session (e.g. when the participant changes device).

First, we create texts that will be useful in determining linearity:

grouped_tracking_data$IsNewUser <-
    (
        grouped_tracking_data$UserId !=
            (lag(grouped_tracking_data$UserId, 1))
    )
grouped_tracking_data$IsNewSession <-
    (
        grouped_tracking_data$ReadingSessionNumber !=
            (lag(grouped_tracking_data$ReadingSessionNumber, 1))
    )
grouped_tracking_data$IsReadingSetUpChange <-
    (
        grouped_tracking_data$NBFirstPagesInSection !=
            (lag(grouped_tracking_data$NBFirstPagesInSection, 1))
    )
# first event is TRUE for all 3 tests
grouped_tracking_data[1, "IsNewUser"] <- TRUE
grouped_tracking_data[1, "IsNewSession"] <- TRUE
grouped_tracking_data[1, "IsReadingSetUpChange"] <- TRUE

We can then calculate the amount of ‘pages’ turned between navigation blocks. The variable is calculated by subtracting a navigation block’s start location (in pages) from the next navigation block’s start location. If the reading set up, user, or reading session changes between the two navigation blocks and the navigation block includes movement, ‘PagesTurned’ is instead calculated by comparing the first page in a navigation block to the last page in the same navigation block.

# count the number of pages turned within a navigation block
grouped_tracking_data$PagesTurned <-
    ifelse(
        ( # following navigation block is from
            ## same user,
            ## same reading session, and
            ## same reading set up
            !lead(grouped_tracking_data$IsNewUser, 1) &
                !lead(grouped_tracking_data$IsNewSession, 1) &
                !lead(grouped_tracking_data$IsReadingSetUpChange, 1)
        ),
        (
            lead(grouped_tracking_data$NBFirstPage, 1) -
                grouped_tracking_data$NBFirstPage
        ),
        ifelse(
            ( # the navigation block includes movement but
                ## the navigation block following it has a
                ### different user,
                ### different reading session, or
                ### different reading set up
                grouped_tracking_data$NBFirstPage !=
                    grouped_tracking_data$NBLastPage
            ),
            (
                grouped_tracking_data$NBLastPage -
                    grouped_tracking_data$NBFirstPage
            ),
            # the navigation block doesn't include movement
            0
        )
    )
# Manually set value for PagesTurned on last row of data
LastRow <- nrow(grouped_tracking_data)
grouped_tracking_data[LastRow, "PagesTurned"] <- (
    grouped_tracking_data[LastRow, "NBLastPage"] -
        grouped_tracking_data[LastRow, "NBFirstPage"]
)
# Check PagesTurned
## - regression, + forward leap or a chronological page turn, 0 not navigation
table(
    sign(grouped_tracking_data$PagesTurned)
)
## 
##   -1    0    1 
##  531 1489 1155

PagesTurned tells us the direction and extent of navigation. We use this information to assign each navigation event a label of ‘Regression’ or ‘ForwardLeap’.

Considering that regressions refer to any movement backwards in text, any navigation block with a negative value for PagesTurned can be labelled as a ‘Regression’ (PagesTurned < 0). Forward leaps include forward movement to a position further than the next page (PagesTurned > 1), either by turning pages at a browsing speed (NavigationBlockSpeedLabel == “Browsing”) or by using the progress bar (Type == progressBarJump).

grouped_tracking_data$IsRegression <-
    (
        grouped_tracking_data$PagesTurned < 0
    )
table(grouped_tracking_data$IsRegression)
## 
## FALSE  TRUE 
##  2644   531
grouped_tracking_data$IsForwardLeap <-
    (
        ((grouped_tracking_data$PagesTurned > 1) &
            (grouped_tracking_data$NBSpeedLabel == "Browsing")) |
            ((grouped_tracking_data$PagesTurned > 1) &
                (grouped_tracking_data$NBAnyProgressBarUsage))
    )
table(grouped_tracking_data$IsForwardLeap)
## 
## FALSE  TRUE 
##  3103    72

The data includes 531 regressions and 72 forward leaps.

Our aim is to measure linearity on each page-view, and so we need to merge grouped_tracking_data with wrangled_tracking_data. First, we load in wrangled_tracking_data:

tracking_data <-
    read.csv(
        paste0(
            mypath_SSRB,
            "/Data/wrangled_tracking_data.csv"
        ),
        header = TRUE,
        sep = ";",
        dec = ","
    )
tracking_data <- dplyr::select(tracking_data, -X, -X.1)
## turn user and story indicators into factors
tracking_data[, c(
    "UserId",
    "StoryId"
)] <- convert.magic(
    tracking_data[, c(
        "UserId",
        "StoryId"
    )],
    "factor"
)
## turn reading block (page view) number,
### reading session number, and
#### navigation block number into ordered factors
tracking_data[, c(
    "ReadingBlockNumber",
    "ReadingSessionNumber",
    "NavigationBlockNumber"
)] <- convert.magic(
    tracking_data[, c(
        "ReadingBlockNumber",
        "ReadingSessionNumber",
        "NavigationBlockNumber"
    )],
    "ordered"
)
## fix date and time variable types
tracking_data$Date <-
    as.Date(
        tracking_data$Date,
        format = "%Y-%m-%d"
    )
TempTimeObject <- strsplit(tracking_data$Time, " ")
TempTimeObject <- sapply(TempTimeObject, "[[", 2)
tracking_data$TempTimeObject <- TempTimeObject
tracking_data <- dplyr::select(tracking_data, -Time)
names(tracking_data)[which(colnames(tracking_data) == "TempTimeObject")] <- "Time"
tracking_data$DateTime <- paste(
    tracking_data$Date,
    tracking_data$Time
)
op <-
    options(digits.secs = 3)
tracking_data$Time <-
    strptime(
        tracking_data$Time,
        format = "%H:%M:%OS"
    )
tracking_data <- tracking_data %>%
    mutate(DateTime = as.POSIXct(DateTime, format = "%Y-%m-%d %H:%M:%OS"))

Merge the dataframes together:

# columns selected from grouped_tracking_data:
## UserId, NavigationBlockNumber, ReadingSessionNumber
### IsRegression, IsForwardLeap, PagesTurned,
### NBFirstCumulativeRSTime, NBLastCumulativeRSTime, NBSpeedLabel
# Identify these columns first
column_UserId <- which(colnames(grouped_tracking_data) == "UserId")
column_NBNumber <- which(colnames(grouped_tracking_data) == "NavigationBlockNumber")
column_RSNumber <- which(colnames(grouped_tracking_data) == "ReadingSessionNumber")
column_IsRegression <- which(colnames(grouped_tracking_data) == "IsRegression")
column_IsForwardLeap <- which(colnames(grouped_tracking_data) == "IsForwardLeap")
column_PagesTurned <- which(colnames(grouped_tracking_data) == "PagesTurned")
column_NBFirstCumulativeRSTime <- which(colnames(grouped_tracking_data) == "NBFirstCumulativeRSTime")
column_NBLastCumulativeRSTime <- which(colnames(grouped_tracking_data) == "NBLastCumulativeRSTime")
column_NBSpeedLabel <- which(colnames(grouped_tracking_data) == "NBSpeedLabel")

# Merge dfs:
tracking_data <-
    merge(
        tracking_data,
        grouped_tracking_data[, c(
            column_UserId,
            column_NBNumber,
            column_RSNumber,
            column_IsRegression,
            column_IsForwardLeap,
            column_PagesTurned,
            column_NBFirstCumulativeRSTime,
            column_NBLastCumulativeRSTime,
            column_NBSpeedLabel
        )],
        by = c("UserId", "ReadingSessionNumber", "NavigationBlockNumber"),
        all.x = TRUE
    )

Next, we create columns that summarise information on nonlinearity based on IsRegression and IsForwardLeap (1) IsNonlinearNavigation tells us whether the event includes a regression or a forward leap (2) Linearity tells us what is the type of linearity (regression, forward leap, or linear/nonnavigation) (3) StartsNonlinearity tells us whether the event initiates nonlinear navigation following linear navigation, nonnavigation, or nonlinear navigation of a different type.

Calculate (1) IsNonlinearNavigation:

tracking_data$IsNonlinearNavigation <-
    ifelse(
        (tracking_data$IsRegression |
            tracking_data$IsForwardLeap),
        TRUE,
        FALSE
    )
table(tracking_data$IsNonlinearNavigation)
## 
## FALSE  TRUE 
##  6424  2350

2350 of the 8774 events include nonlinear navigation of the text (26.78%).

Usage of nonlinear navigation varies between participants:

table(tracking_data$IsNonlinearNavigation, tracking_data$UserId)
##        
##           6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23
##   FALSE  74 128 122  56  75  90 170 116 433  73  74 118  90  60  78 160 112  44
##   TRUE    7 136   8  21   6  17 229  63  28  20  11  20  17  13  18   8  56   3
##        
##          24  28  29  30  31  32  34  35  36  37  38  39  40  41  42  43  44  45
##   FALSE  91 171 131  47  52 143 102 103  93  67 123 136  94  69  81 113 133  71
##   TRUE    2 146  20   0  23  19  30 144  29   4  26  11  22   8  10   9 162  92
##        
##          46  47  48  49  50  51  52  53  55  57  58  60  62  63  64  65  66  67
##   FALSE 169  66  64  54  83 144 120  69  69  83  89 248  79  88 243 162 102  87
##   TRUE    8   4   3   0  18 110  15   2  18  10  68 172   4  21   6 171  13  14
##        
##          72  74  75  76  77  92
##   FALSE  86  92 145  72  58  59
##   TRUE   56   0 150  31  16   2
ggplot(tracking_data, aes(x = IsNonlinearNavigation, group = UserId, fill = UserId)) +
    geom_bar(position = position_dodge(), stat = "count") +
    theme_classic()

Indeed, the plot indicates that some participants use nonlinearity quite often whereas others use it very rarely.

Calculate (2) Linearity:

tracking_data$Linearity <-
    ifelse(
        tracking_data$IsRegression,
        "Regression",
        ifelse(
            tracking_data$IsForwardLeap,
            "ForwardLeap",
            "LinearOrNonNavigation"
        )
    )
table(tracking_data$Linearity)
## 
##           ForwardLeap LinearOrNonNavigation            Regression 
##                   687                  6424                  1663

Of the 2350 nonlinear navigation events, 1663 are regressions (70.77%), and the remaining 687 are forward leaps (29.23%).

Usage of the different linearity types again varies between participants:

table(tracking_data$Linearity, tracking_data$UserId)
##                        
##                           6   7   8   9  10  11  12  13  14  15  16  17  18  19
##   ForwardLeap             5  44   0   7   1   5  93  20   0   0   0   1   7   0
##   LinearOrNonNavigation  74 128 122  56  75  90 170 116 433  73  74 118  90  60
##   Regression              2  92   8  14   5  12 136  43  28  20  11  19  10  13
##                        
##                          20  21  22  23  24  28  29  30  31  32  34  35  36  37
##   ForwardLeap             1   0  13   0   0  30   2   0   5   0   5  66   5   0
##   LinearOrNonNavigation  78 160 112  44  91 171 131  47  52 143 102 103  93  67
##   Regression             17   8  43   3   2 116  18   0  18  19  25  78  24   4
##                        
##                          38  39  40  41  42  43  44  45  46  47  48  49  50  51
##   ForwardLeap             8   0   1   0   0   1  72  39   1   0   0   0   3  28
##   LinearOrNonNavigation 123 136  94  69  81 113 133  71 169  66  64  54  83 144
##   Regression             18  11  21   8  10   8  90  53   7   4   3   0  15  82
##                        
##                          52  53  55  57  58  60  62  63  64  65  66  67  72  74
##   ForwardLeap             1   0   0   0  15  23   0   4   0  88   0   0   7   0
##   LinearOrNonNavigation 120  69  69  83  89 248  79  88 243 162 102  87  86  92
##   Regression             14   2  18  10  53 149   4  17   6  83  13  14  49   0
##                        
##                          75  76  77  92
##   ForwardLeap            74  12   0   0
##   LinearOrNonNavigation 145  72  58  59
##   Regression             76  19  16   2

Note that the amount of events is connected to participants’ device size, and so users’ event counts are not directly comparable.

To calculate (3) StartsNonlinearity we first order tracking_data and create tests:

# order the df by User, date, time, and Id
tracking_data <-
    tracking_data[
        with(
            tracking_data,
            order(UserId, Date, Time, Id)
        ),
    ]
tracking_data$IsNewUser <-
    (
        tracking_data$UserId !=
            (lag(tracking_data$UserId, 1))
    )
tracking_data$IsNewSession <-
    (
        tracking_data$ReadingSessionNumber !=
            (lag(tracking_data$ReadingSessionNumber, 1))
    )
tracking_data$IsReadingSetUpChange <-
    (
        tracking_data$TotalPagesInSection !=
            (lag(tracking_data$TotalPagesInSection, 1))
    )
# first event is TRUE for all 3 tests
tracking_data[1, "IsNewUser"] <- TRUE
tracking_data[1, "IsNewSession"] <- TRUE
tracking_data[1, "IsReadingSetUpChange"] <- TRUE

Then, use tests in calculating StartsNonlinearity:

## Find events that initiate nonlinearity
for (row in 1:nrow(tracking_data)) {
    if (row == 1 |
        tracking_data[row, "IsNewUser"] |
        tracking_data[row, "IsNewSession"] |
        tracking_data[row, "IsReadingSetUpChange"]) {
        # first row, new user, new session, or new set up
        if (tracking_data[row, "IsNonlinearNavigation"]) {
            # nonlinear navigation
            tracking_data[row, "StartsNonlinearity"] <-
                TRUE
        } else {
            # not nonlinear
            tracking_data[row, "StartsNonlinearity"] <-
                FALSE
        }
    } else {
        # same user, reading session and reading set up
        if ((tracking_data[row - 1, "Linearity"] !=
            tracking_data[row, "Linearity"]) &
            tracking_data[row, "IsNonlinearNavigation"]) {
            # linearity type changes between previous and current event
            ## and current event is nonlinear
            tracking_data[row, "StartsNonlinearity"] <-
                TRUE
        } else {
            # the nonlinear event doesn't start nonlinearity
            tracking_data[row, "StartsNonlinearity"] <-
                FALSE
        }
    }
}
table(tracking_data$StartsNonlinearity)
## 
## FALSE  TRUE 
##  8250   524

Out of all events, 524 iniate nonlinearity (5.97%).

table(tracking_data$StartsNonlinearity, tracking_data$UserId)
##        
##           6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23
##   FALSE  78 246 126  75  77 102 369 161 449  85  80 129 104  68  88 165 154  46
##   TRUE    3  18   4   2   4   5  30  18  12   8   5   9   3   5   8   3  14   1
##        
##          24  28  29  30  31  32  34  35  36  37  38  39  40  41  42  43  44  45
##   FALSE  92 274 139  47  69 153 123 240 112  69 141 143 104  74  86 116 288 153
##   TRUE    1  43  12   0   6   9   9   7  10   2   8   4  12   3   5   6   7  10
##        
##          46  47  48  49  50  51  52  53  55  57  58  60  62  63  64  65  66  67
##   FALSE 169  68  65  54  92 219 126  70  78  87 146 401  81 101 245 308 110  95
##   TRUE    8   2   2   0   9  35   9   1   9   6  11  19   2   8   4  25   5   6
##        
##          72  74  75  76  77  92
##   FALSE 124  92 273  94  67  60
##   TRUE   18   0  22   9   7   1

StartsNonlinearity also varies between participants. This measure is not connected to participants’ number of events and so we can compare it between participants.

Create a summary dataframe to compare participants in their linearity:

participant_navigation_counts <- tracking_data %>%
    group_by(UserId) %>%
    summarise(
        StartsNonlinearityCount = sum(StartsNonlinearity),
        IsNonlinearNavigationCount = sum(IsNonlinearNavigation),
        IsRegressionCount = sum(IsRegression),
        IsForwardLeapCount = sum(IsForwardLeap),
        EventCount = n()
    )
StaticPlot <- ggplot(
    participant_navigation_counts,
    aes(x = IsNonlinearNavigationCount, y = StartsNonlinearityCount, colour = UserId)
) +
    geom_point() +
    theme_classic()
ggplotly(StaticPlot)

IsNonlinearNavigation and StartsNonlinearity are strongly correlated when there are only few nonlinear navigation events. This indicates that the nonlinear events are likely to be separate from each other, instead of being used for a longer period at once (consecutive nonlinearity or not). For example, participant id 55 has 18 nonlinear navigation events and 9 of them initiate nonlinearity. This indicates that on average, participant id 55 used two nonlinear navigation events concurrently (18/9=2).

However, this connection is less apparent for the participants who have more nonlinear events. For example, participant id 7 has 136 nonlinear navigation events but only 18 initiating events, indicating that on average, the participant used 8 nonlinear events after one initiating nonlinear event. In contrast, participant id 51 used nonlinear navigation 110 times and initiated nonlinearity 35 times, making their average count of nonlinear events per an initiated nonlinearity 3.

Finally, to create a measure for linearity to use in the analysis, we filter the dataframe to only include one event per page-view. We therefore remove events that occur outside of visible page-views, such as disengagements and dialog events (triggered by viewing of information sheet). We then select one event per page-view that includes information on linearity (in particular, “StartsNonlinearity”)

# Remove events that do not occur on a page-view
## 8144 rows (full data 8774 rows)
linearity_measure_data <- tracking_data %>%
    filter(
        Engagement != "Disengagement" &
            Engagement != "Dialog"
    )
# make sure the df is correctly ordered
tracking_data <-
    tracking_data[
        with(
            tracking_data,
            order(UserId, Date, Time, Id)
        ),
    ]
# Select one event per each page-view:
## group events by
### UserId, ReadingSessionNumber, ReadingBlockNumber (page-view indicator), and Condition
#### and summarise other important variables into columns
linearity_measure_data <- linearity_measure_data %>%
    group_by(UserId, StoryId, ReadingSessionNumber, ReadingBlockNumber, Condition) %>%
    summarise(
        StartLocation = first(StartLocation),
        EndLocation = first(EndLocation),
        IsNewUser = any(IsNewUser),
        IsNewSession = any(IsNewSession),
        IsReadingSetUpChange = any(IsReadingSetUpChange),
        IncludesNonlinearity = any(IsNonlinearNavigation),
        StartsNonlinearity = any(StartsNonlinearity),
        FirstTimeUntilDeadlineDays = first(TimeBeforeDeadlinesDays),
        FirstCumulativeRSTime = first(CumulativeRSTime),
        WindowWidth = first(WindowWidth)
    )
## `summarise()` has grouped output by 'UserId', 'StoryId',
## 'ReadingSessionNumber', 'ReadingBlockNumber'. You can override using the
## `.groups` argument.

Save linearity measure df

The new df is saved for usage in analysis. The dataset has already been saved, and so the below sr code chunk is not run.

# write.csv2(
#     linearity_measure_data,
#     "linearity_measure_data.csv"
# )